Using Restricted Random Walks for Library Recommendations

نویسندگان

  • Markus Franke
  • Andreas Geyer-Schulz
چکیده

Recommendations are a valuable help for library users e.g. striving to gain an overview of the important literature for a certain topic. We describe a new method for generating recommendations for documents based on clustering purchase histories. The algorithm presented here is called restricted random walk (RRW) clustering and has proven to cope efficiently with large data sets. Furthermore, as will be shown, the clusters are very well suited for giving recommendations in the context of library usage data. 1 Motivation and Introduction Services like amazon.com’s “Customers who bought this book also bought . . .” are an important service – for all involved parties: The customer receives assistance in finding his way through the range of books offered by the shop, the bookseller has the possibility to increase its sales by proposing complementary literature to its customers [1]. Technically, a recommender service can be implemented in different ways. We will present an innovative approach based on a fast clustering algorithm for large object sets [2] and making use of product cross-occurrences in purchase histories: In our case, the purchase histories are those of users of the Online Public Access Catalogue (OPAC) of the university’s library at Karlsruhe, and a purchase is the viewing of a document’s detail page in the WWW interface of the OPAC. A cross-occurrence between two documents is given when their detail pages have been viewed together in one user session. Following the standard assumption for behavior-based recommender systems, we assume that a high number of cross-occurrences hints at a high complementarity of two documents that we can interpret in the recommender context as similarity. The paper is structured as follows: We start by outlining existing recommender systems and cluster algorithms in section 2. In section 3 we will present the restricted random walk clustering algorithm before discussing the generation of recommendations from clusters in section 4. Results will be shown in section 5 and a conclusion as well as an outlook onto further research topics are given in section 6. 2 Recommender systems and Cluster Algorithms for Library OPACs General classification schemes for recommender systems have been presented by Resnick and Varian [3], by Schafer et al. [1], and Gaul et al. [4]. Franke M. and Geyer-Schulz A. (2005). Using Restricted Random Walks for Library Recommendations. In Proceedings of the 1st International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, pages 107-115 DOI: 10.5220/0001412001070115 Copyright c © SciTePress The systems we will scrutinize more closely are so-called implicit recommender systems that generate recommendations from user protocol data – e.g. purchase histories at the (online) store, Usenet postings or bookmarks – without the need of user cooperation. This distinction between implicit and explicit recommender systems is important since no additional customer effort is necessary to gain these recommendations and thus incentive-related problems like free riding or bias are minimal. This has been discussed e.g. by Geyer-Schulz et al. [5] or Nichols [6]. All recommender systems mentioned here have in common that they do not perform content analysis, contrary to information retrieval based methods as described for instance by Semeraro [7], Yang [8] and others. This is important since in a hybrid library like the one in Karlsruhe, only a fraction of the corpus is available in digital form. Currently, two methods are being broadly used to generate recommendations from purchase histories: A straightforward one employed for instance by amazon.com, and an LSD model based approach using Ehrenberg’s repeat buying theory [9] used for example at the university library in Karlsruhe [10]. The first approach is to recommend the books that have been bought (or viewed) most often together with the book the customer is currently considering. The challenges of this idea lie mainly in its implementation for large data sets, even if the matrix of common purchases is quite sparse. Another, more sophisticated model makes use of Ehrenberg’s repeat buying theory [9, 10]. Its advantage lies in a noticeably better quality of the recommendations, because the underlying assumption of a logarithmic series distribution allows to distinguish between random and meaningful cross-occurrences in a more robust way. However, these recommender systems only take into account direct neighborhoods in the similarity graph generated by the purchase histories. Each extension that includes the neighbors of the neighbors into the recommendations quickly becomes computationally intractable. This is not the case with cluster-based recommender systems: the recommendations do not only contain the documents directly related to each other, but the clusters also account for indirect relations where this is necessary. For a general overview of clustering and classification algorithms, we refer to Duda et al. [11] or Bock [12]. In the past there have been some proposals for recommender systems or collaborative filtering based on cluster algorithms [13, 14]. We chose restricted random walk clustering for two reasons: Its ability to cope with large data sets that will be discussed in section 3.4 and the quality of its clusters with respect to library purchase histories. Viegener [15] investigated the use of cluster algorithms for the construction of a library’s thesaurus extensively. On the one hand, Viegener’s results are encouraging because he found semantically meaningful patterns in library data. On the other hand, all standard cluster algorithms proved to be computationally expensive – Viegener’s results were computed on a supercomputer at the Universität Karlsruhe that is not available for routine library operations. Besides, the quality of the clusters generated by the algorithms scrutinized may not be sufficient for recommendations. Single linkage clustering for instance is prone to bridging, i.e. to connecting independent clusters via an object located between clusters, a bridge element. 108

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two non-holonomic lattice walks in the quarter plane

We present two classes of random walks restricted to the quarter plane whose generating function is not holonomic. The non-holonomy is established using the iterated kernel method, a recent variant of the kernel method. This adds evidence to a recent conjecture on combinatorial properties of walks with holonomic generating functions. The method also yields an asymptotic expression for the numbe...

متن کامل

1 M ay 2 00 8 RANDOM WALKS , ARRANGEMENTS , CELL COMPLEXES , GREEDOIDS , AND SELF - ORGANIZING LIBRARIES

The starting point is the known fact that some much-studied random walks on permutations, such as the Tsetlin library, arise from walks on real hyperplane arrangements. This paper explores similar walks on complex hyperplane arrangements. This is achieved by involving certain cell complexes naturally associated with the arrangement. In a particular case this leads to walks on libraries with sev...

متن کامل

Random Walks, Arrangements, Cell Complexes, Greedoids, and Self-organizing Libraries

The starting point is the known fact that some much-studied random walks on permutations, such as the Tsetlin library, arise from walks on real hyperplane arrangements. This paper explores similar walks on complex hyperplane arrangements. This is achieved by involving certain cell complexes naturally associated with the arrangement. In a particular case this leads to walks on libraries with sev...

متن کامل

Semigroups, Rings, and Markov Chains

We analyze random walks on a class of semigroups called ``left-regular bands.'' These walks include the hyperplane chamber walks of Bidigare, Hanlon, and Rockmore. Using methods of ring theory, we show that the transition matrices are diagonalizable and we calculate the eigenvalues and multiplicities. The methods lead to explicit formulas for the projections onto the eigenspaces. As examples of...

متن کامل

0 Ju n 20 00 SEMIGROUPS , RINGS , AND MARKOV CHAINS KENNETH

We analyze random walks on a class of semigroups called “leftregular bands”. These walks include the hyperplane chamber walks of Bidigare, Hanlon, and Rockmore. Using methods of ring theory, we show that the transition matrices are diagonalizable and we calculate the eigenvalues and multiplicities. The methods lead to explicit formulas for the projections onto the eigenspaces. As examples of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010